NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Memory-Efficient LLM Training with Online Subspace Descent

Liang, Kaizhao; Liu, Bo; Chen, Lizhang; Liu, Qiang (August 2024, https://doi.org/10.48550/arXiv.2408.12857)

Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using a projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. This work provides the first convergence guarantee for arbitrary update rules of projection matrices, generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including common ones such as LION and Adam. Inspired by this theoretical understanding, the authors propose Online Subspace Descent, a new family of subspace descent optimizers that do not rely on SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates it with online PCA. This approach is flexible and introduces minimal overhead to training. Experiments show that for pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream task performance than state-of-the-art low-rank training methods across settings, narrowing the gap with full-rank baselines.
more » « less
Full Text Available
Lion secretly solves constrained optimization: As lyapunov predicts

Chen, Lizhang; Liu, Bo; Liang, Kaizhao; Liu, Qiang (January 2024, International Conference of Learning Representations)

Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $$\norm{x}_\infty \leq 1/\lambda$$. Lion achieves this through the incorporation of decoupled weight decay, where $$\lambda$$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$$\phi$$ algorithms, where the $$\text{sign}(\cdot)$$ operator in Lion is replaced by the subgradient of a convex function $$\phi$$, leading to the solution of a general composite optimization problem of $$\min_x f(x) + \phi^*(x)$$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
more » « less
Full Text Available
A Unified Knowledge Distillation Framework for Deep Directed Graphical Models

https://doi.org/10.1109/CVPR52729.2023.00753

Chen, Yizhuo; Liang, Kaizhao; Zeng, Zhe; Yao, Shuochao; Shao, Huajie (June 2023, IEEE)

Full Text Available
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Chen, Beidi; Dao, Tri; Liang, Kaizhao; Yang, Jiaming; Song, Zhao; Rudra, Atri; Re, Christopher (January 2022, International Conference on Learning Representations (ICLR))

Full Text Available

Search for: All records